Democratizing Access to Education Data

The Urban Institute’s Education Data Portal

Erika Tyagi

The Education Data Portal bridges the gap between data availability and data accessibility.

  1. What do I mean by the availability-accessibility gap?
  2. How does the portal bridge this gap so effectively?
  3. Why does bridging this gap matter?

The Education Data Portal

  • What? A freely available
    one-stop-shop for 100+ datasets released by government agencies and other institutions on schools, districts, and colleges in the U.S.
  • Why? To make it easier for both
    technical and non-technical users to look at trends over time and combine data from different sources

What do I mean by the availability-accessibility gap?

Example: How has tuition at my alma mater risen over my lifetime?

Without the Education Data Portal…

Example: How has tuition at my alma mater risen over my lifetime?

Without the Education Data Portal…

Example: How has tuition at my alma mater risen over my lifetime?

  • Find the agency collecting the data

Without the Education Data Portal…

Example: How has tuition at my alma mater risen over my lifetime?

  • Find the agency collecting the data
  • Read the data documentation

Without the Education Data Portal…

Example: How has tuition at my alma mater risen over my lifetime?

  • Find the agency collecting the data
  • Read the data documentation
  • Download data files for each year

Without the Education Data Portal…

Example: How has tuition at my alma mater risen over my lifetime?

  • Find the agency collecting the data
  • Read the data documentation
  • Download data files for each year
  • Load each file into R or Python

Without the Education Data Portal…

Example: How has tuition at my alma mater risen over my lifetime?

  • Find the agency collecting the data
  • Read the data documentation
  • Download data files for each year
  • Load each file into R or Python
  • Notice anomalies in the data

Without the Education Data Portal…

Example: How has tuition at my alma mater risen over my lifetime?

  • Find the agency collecting the data
  • Read the data documentation
  • Download data files for each year
  • Load each file into R or Python
  • Notice anomalies in the data
  • Re-read the data documentation

Without the Education Data Portal…

Example: How has tuition at my alma mater risen over my lifetime?

  • Find the agency collecting the data
  • Read the data documentation
  • Download data files for each year
  • Load each file into R or Python
  • Notice anomalies in the data
  • Re-read the data documentation
  • Update the code per the documentation

Without the Education Data Portal…

Example: How has tuition at my alma mater risen over my lifetime?

  • Find the agency collecting the data
  • Read the data documentation
  • Download data files for each year
  • Load each file into R or Python
  • Notice anomalies in the data
  • Re-read the data documentation
  • Update the code per the documentation
  • Remember to repeat the process again next year

Without the Education Data Portal…

Example: How has tuition at my alma mater risen over my lifetime?

  • Find the agency collecting the data
  • Read the data documentation
  • Download data files for each year
  • Load each file into R or Python
  • Notice anomalies in the data
  • Re-read the data documentation
  • Update the code per the documentation
  • Remember to repeat the process again next year
  • (And hope nothing changes)

Without the Education Data Portal…

Example: How has tuition at my alma mater risen over my lifetime?

  • Find the agency collecting the data
  • Read the data documentation
  • Download data files for each year
  • Load each file into R or Python
  • Notice anomalies in the data
  • Re-read the data documentation
  • Update the code per the documentation
  • Remember to repeat the process again next year
  • (And hope nothing changes)

This is tedious, error-prone, and not fun.

Using the portal R package

Example: How has tuition at my alma mater risen over my lifetime?

library(educationdata)

# Get data 
data <- get_education_data(
  level = "college-university",
  source = "ipeds",
  topic = "academic-year-tuition",
  filters = list(
    year = c(1990:2020), 
    unitid = "173258", 
    tuition_type = "4"
  )
)

# Plot data 
data %>%
  ggplot(aes(x = year, y = tuition_fees_ft)) +
  geom_line()

Using the portal Python package

Example: How has tuition at my alma mater risen over my lifetime?

import educationdata 

# Get data 
data = get_education_data(
  level = "college-university",
  source = "ipeds",
  topic = "academic-year-tuition",
  filters = {
    "year": range(1990, 2020), 
    "unitid": "173258", 
    "tuition_type": "4" 
  }
)

# Plot data 
data.plot.line(
  x = "year", y = "tuition_fees_ft"
)

The Python package is not yet publicly available.

Using the portal Stata package

Example: How has tuition at my alma mater risen over my lifetime?

* Get data 
educationdata using ///
  "college ipeds academic-year-tuition", sub( ///
  year=1990/2020 ///
  unitid=173258 ///
  tuition_type=4 ///
)

* Plot data 
twoway (line tuition_fees_ft year)







Using the portal Data Explorer

Example: How has tuition at my alma mater risen over my lifetime?

How does the portal bridge this gap so effectively?

  1. By focusing on the underlying API
  2. By focusing on data documentation

The underlying API

Provides the foundation of the portal

  • 100+ data endpoints
    (with the data)
  • 12+ metadata endpoints
    (about the data)
  • All other tools, packages, and documentation are built from these endpoints

Data documentation

Considered a first-order feature of the portal

  • Written for both
    humans and machines
  • Provides the user with
    details on demand

Data documentation

Written for both humans and machines

Data documentation

Written for both humans and machines

{
  "count": 1,
  "next": null,
  "previous": null,
  "results": [
    {
      "variable": "urban_centric_locale",
      "label": "Degree of urbanization (urban-centric locale)",
      "format": "urban_centric_locale",
      "data_type": "integer",
      "values": "{
          1: '1 - Large city', 
          2: '2 - Midsize city', 
          3: '3 - Urban fringe of large city', 
          [...]
      }",
      [...]
    }
  ]
}

https://educationdata.urban.org/api/v1/api-variables/?variable=urban_centric_locale

Data documentation

Provides the user with details on demand

How does the portal bridge this gap so effectively?

By focusing on the underlying API and data documentation

Through collaboration with education and technology experts

  • Education contributors: Erica Blom, Jay Carter, Leonardo Restrepo
  • Technology contributors: Ben Chartoff, David D’Orio, Graham MacDonald, Kyle Ueyama, and Vivian Zheng

Why does bridging this gap matter?

Different people ask different questions.

Why does bridging this gap matter?

Why does bridging this gap matter?

Why does bridging this gap matter?

Why does bridging this gap matter?

4K people used the portal last month

Why does bridging this gap matter?

By unlocking data for more people, we can allow more questions to find evidence-based answers and drive impact.

Get in touch